A lossless text compression technique using syllable based morphology

نویسندگان

  • Ibrahim Akman
  • Hakan Bayindir
  • Serkan Ozleme
  • Zehra Akin
  • Sanjay Misra
چکیده

In this paper, we present a new lossless text compression technique which utilizes syllable-based morphology of multi-syllabic languages. The proposed algorithm is designed to partition words into its syllables and then to produce their shorter bit representations for compression. The method has six main components namely source file, filtering unit, syllable unit, compression unit, dictionary file and target file. The number of bits in coding syllables depends on the number of entries in the dictionary file. The proposed algorithm is implemented and tested using 20 different texts of different lengths collected from different fields. The results indicated a compression of up to 43%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Enhanced Short Text Compression Scheme for Smart Devices

Short Text Compression is a great concern for data engineering and management. The rapid use of small devices especially, mobile phones and wireless sensors have turned short text compression into a demand-of-thetime. In this paper, we propose an approach of compressing short English text for smart devices. The prime objective of this proposed technique is to establish a low-complexity lossless...

متن کامل

A new compression technique for binary text images

In this paper a new lossless binary text image coding technique based on overlapping partitioning is presented. In this technique, the black regions in the image are#rst partitioned into a number offully overlapping and nonoverlapping rectangles. A f e r partitioning, the two opposite vertices of each rectangle are compressed using a simple encoding technique. It has been demostrated that the c...

متن کامل

An Enhanced Static Data Compression Scheme Of Bengali Short Message

This paper concerns a modified approach of compressing Short Bengali Text Message for small devices. The prime objective of this research technique is to establish a lowcomplexity compression scheme suitable for small devices having small memory and relatively lower processing speed. The basic aim is not to compress text of any size up to its maximum level without having any constraint on space...

متن کامل

Genetic Algorithms in Syllable-Based Text Compression

Syllable based text compression is a new approach to compression by symbols. In this concept syllables are used as the compression symbols instead of the more common characters or words. This new technique has proven itself worthy especially on short to middle-length text files. The effectiveness of the compression is greatly affected by the quality of dictionaries of syllables characteristic f...

متن کامل

The Novel Lossless Text Compression Technique Using Ambigram Logic and Huffman Coding

The new era of networking is looking forward to improved and effective methods in channel utilization. There are many texts where lossless data recovery is vitally essential because of the importance of information it holds. Therefore, a lossless decomposition algorithm which is independent of the nature and pattern of text is today's top concern. Efficiency of algorithms used today varies grea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2011